Skip to content

Limit cluster management job concurrency#149

Merged
danielfrankcom merged 1 commit intomainfrom
chore/limit-cm-concurrency
May 20, 2025
Merged

Limit cluster management job concurrency#149
danielfrankcom merged 1 commit intomainfrom
chore/limit-cm-concurrency

Conversation

@danielfrankcom
Copy link
Copy Markdown
Contributor

This PR modifies the cluster management workflows to ensure only 1 job runs at any given time against each account. This will help prevent conflicts where the cluster cleanup job deletes clusters that are being used by a testing job.

I could have put the concurrency limit on the workflow, which would ensure all jobs and cleanup steps finish before any other workflow can begin. I chose to put it on the job here instead, so that other tasks like formatting checks can complete and provide quick feedback in the PR without having to potentially wait for other workflows to finish.

I tested this change by running 2 of the same workflow, and we can see here the job is waiting before proceeding since there is already a job running. We can also see the formatting check was not blocked.

image

There is a small risk in the following scenario:

  • there are a bunch of jobs all queued up at the same time for the same test suite
  • the tests are broken and don't clean up clusters
  • all testing jobs schedule themselves before all cleanup jobs

If this happens then we could run out of cluster space in the account, and fail some later jobs in the queue which would otherwise have succeeded. If that happens then the cluster cleanup jobs will eventually run once all of the test jobs fail and fix the state of the account, and we can rerun any jobs that should have passed after that. It seems unlikely this will happen anyway given the prerequisites.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT-0 license.

@danielfrankcom danielfrankcom merged commit df58a9a into main May 20, 2025
24 of 26 checks passed
@danielfrankcom danielfrankcom deleted the chore/limit-cm-concurrency branch May 20, 2025 21:01
vic-tsang pushed a commit that referenced this pull request May 20, 2025
Co-authored-by: Daniel Frankcom <frankcom@amazon.com>
vic-tsang added a commit that referenced this pull request May 20, 2025
* make all env variables to be the same across all languages for cluster management

* fixed env names for single cluster

* fixed env variables

* fixed go env name

* fixed single cluster region env name

* close single cluster client in java

* Limit cluster management job concurrency (#149)

Co-authored-by: Daniel Frankcom <frankcom@amazon.com>

---------

Co-authored-by: Victor Tsang <vitsangp@amazon.com>
Co-authored-by: Daniel Frankcom <daniel@frankcom.ca>
Co-authored-by: Daniel Frankcom <frankcom@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants